Using a Genetic Algorithm Approach to Study the Impact of Imbalanced Corpora in Sentiment Analysis

نویسندگان

  • Lohann Ferreira
  • Mariza Dosciatti
  • Júlio C. Nievola
  • Emerson Cabrera Paraiso
چکیده

The SVM classifier has been used in many methods to identify emotions in text due to their good generalization capability and robustness with high dimensionality data. However, most textual corpora usually subject to such methods are naturally imbalanced. As a consequence, the SVM, sensitive to imbalance data, assigns to most texts the majority class. In this article, we present a Genetic Algorithm based approach that aims to reduce the imbalance of the data in the context of emotions identification. This approach allowed us to study the impact of its application in a method of emotion identification in texts written in the Brazilian Portuguese. Experimentations showed us that balancing the corpus could be an alternative when using the SVM classifier for emotions identification, especially in a multiclass configuration.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs

In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...

متن کامل

Mining Interesting Aspects of a Product using Aspect-based Opinion Mining from Product Reviews (RESEARCH NOTE)

As the internet and its applications are growing, E-commerce has become one of its rapid applications. Customers of E-commerce were provided with the opportunity to express their opinion about the product on the web as a text in the form of reviews. In the previous studies, mere founding sentiment from reviews was not helpful to get the exact opinion of the review. In this paper, we have used A...

متن کامل

An empirical study on statistical analysis and optimization of EDM process parameters for inconel 718 super alloy using D-optimal approach and genetic algorithm

Among the several non-conventional processes, electrical discharge machining (EDM) is the most widely and successfully applied for the machining of conductive parts. In this technique, the tool has no mechanical contact with the work piece and also the hardness of work piece has no effect on the machining pace. Hence, this technique could be employed to machine hard materials such as super allo...

متن کامل

Sentiment Analysis of Social Networking Data Using Categorized Dictionary

Sentiment analysis is the process of analyzing a person’s perception or belief about a particular subject matter. However, finding correct opinion or interest from multi-facet sentiment data is a tedious task. In this paper, a method to improve the sentiment accuracy by utilizing the concept of categorized dictionary for sentiment classification and analysis is proposed.  A categorized dictiona...

متن کامل

Improvement of Surface Finish when EDM AISI 2312 Hot Worked Steel using Taguchi Approach and Genetic Algorithm

Nowadays, Electrical Discharge Machining (EDM) has become one of the most extensively used non-traditional material removal process. Its unique feature of using thermal energy to machine hard to machine electrically conductive materials is its distinctive advantage in the manufacturing of moulds, dies and aerospace components. Howevere, EDM is a costly process and hence proper selection of its ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015